4 research outputs found

    Time variance and defect prediction in software projects: Towards an exploitation of periods of stability and change as well as a notion of concept drift in software projects

    Get PDF
    It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project's goals. In this paper we first verify the existence of variability in a bug prediction model's accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a high variability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models' accuracy in advanc

    Managing Temporal Graph Data While Preserving Semantics

    Full text link
    This thesis investigates the introduction of time as a first-class citizen to RDF-based knowledge bases as used by the Linked Data movement. By presenting EvoOnt, a use-case scenario from the field of software comprehension we demonstrate a particular field that (1) benefits from the Semantic Web’s tools and techniques, (2) has a high update rate and (3) is a candidate-dataset for Linked Data. EvoOnt is a set of OWL ontologies that cover three aspects of the software development process: A source code ontology that abstracts the elements of object-oriented code, a defect tracker ontology that models the contents of a defect database (a.k.a. bug tracker) and finally a version ontology that allows the expression of multiple versions of a source code file. In multiple experiment we demonstrate how Semantic Web tools and techniques can be used to perform common tasks known from software comprehension. Derived from this use case we show how the temporal dimension can be leveraged in RDF data. Firstly, we present a representation format for the annotation of RDF triples with temporal validity intervals. We propose a special usage of named graphs in order to encode temporal triples. Secondly, we demonstrate how such a knowledge base can be queried using a temporal syntax extension of the SPARQL query language. Next, we present two indexing structures that speed up the processing and querying time of temporally annotated data. Furthermore, we demonstrate how additional knowledge can be extracted from the temporal dimension by matching patterns that contain temporal constraints. All those elements put together outlines a method that can be used to make the datasets published as Linked Data more robust to possible invalidations through updates of liked datasets. Additionally, processing and querying can be improved through sophisticated index structures while deriving additional information from the history of a dataset

    Time variance and defect prediction in software projects

    Full text link
    It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project’s goals. In this paper we first verify the existence of variability in a bug prediction model’s accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a highvariability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models’ accuracy in advance
    corecore